AITopics | probabilistic transformer

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Neural Information Processing SystemsDec-24-2025, 23:22:15 GMT

Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.

modelling ambiguity and distribution, probabilistic transformer, rna folding and molecule design, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Probabilistic Transformer For Time Series Analysis

Neural Information Processing SystemsDec-24-2025, 21:32:35 GMT

Generative modeling of multivariate time series has remained challenging partly due to the complex, non-deterministic dynamics across long-distance timesteps. In this paper, we propose deep probabilistic methods that combine state-space models (SSMs) with transformer architectures. In contrast to previously proposed SSMs, our approaches use attention mechanism to model non-Markovian dynamics in the latent space and avoid recurrent neural networks entirely. We also extend our models to include several layers of stochastic variables organized in a hierarchy for further expressiveness. Compared to transformer models, ours are probabilistic, non-autoregressive, and capable of generating diverse long-term forecasts with uncertainty estimates. Extensive experiments show that our models consistently outperform competitive baselines on various tasks and datasets, including time series forecasting and human motion prediction.

name change, probabilistic transformer, time series analysis, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Probabilistic Transformer For Time Series Analysis

Neural Information Processing SystemsJan-19-2025, 02:49:54 GMT

Generative modeling of multivariate time series has remained challenging partly due to the complex, non-deterministic dynamics across long-distance timesteps. In this paper, we propose deep probabilistic methods that combine state-space models (SSMs) with transformer architectures. In contrast to previously proposed SSMs, our approaches use attention mechanism to model non-Markovian dynamics in the latent space and avoid recurrent neural networks entirely. We also extend our models to include several layers of stochastic variables organized in a hierarchy for further expressiveness. Compared to transformer models, ours are probabilistic, non-autoregressive, and capable of generating diverse long-term forecasts with uncertainty estimates.

probabilistic transformer, time series analysis

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Neural Information Processing SystemsJan-18-2025, 12:28:42 GMT

Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions.

modelling ambiguity and distribution, probabilistic transformer, rna folding and molecule design, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

Wu, Haoyi, Tu, Kewei

arXiv.org Artificial IntelligenceNov-26-2023

Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.

probabilistic transformer, representation, transformer, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-acl.482

2311.15211

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Small Transformers Compute Universal Metric Embeddings

Kratsios, Anastasis, Debarnot, Valentin, Dokmanić, Ivan

arXiv.org Machine LearningOct-18-2022

We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about $n\log(n)$ and width about $n^2$ can bi-H\"{o}lder embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees, which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If $\mathcal{X}$'s geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications, we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs. When instead embedding into multivariate Gaussian mixtures, we show that probabilistic transformers can compute bi-H\"{o}lder embeddings with arbitrarily small distortion.

artificial intelligence, machine learning, metric space, (18 more...)

arXiv.org Machine Learning

2209.06788

Country:

North America > Canada > Ontario > Hamilton (0.14)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Israel (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probabilistic Transformers

Movellan, Javier R.

arXiv.org Machine LearningNov-12-2020

We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to inference-time model adaptation and to other probabilistic cases.

pixel, probabilistic transformer, transformer, (15 more...)

arXiv.org Machine Learning

2010.15583

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Collaborating Authors

probabilistic transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Probabilistic Transformer For Time Series Analysis

Probabilistic Transformer For Time Series Analysis

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

Small Transformers Compute Universal Metric Embeddings

Probabilistic Transformers